Data Quality Not Your Typical Database Problem

نویسنده

  • Mourad Ouzzani
چکیده

Textbook database examples are often wrong and simplistic. Unfortunately Data is never born clean or pure. Errors, missing values, repeated entries, inconsistent instances and unsatisfied business rules are the norm rather than the exception. Data cleaning (also known as data cleansing, record linkage and many other terminologies) is growing as a major application requirement and an interdisciplinary research area. In this talk, we will start by discussing some of the major issues and challenges facing creating effective and efficient data cleaning solutions. Then we will discuss some challenges and criticize current conservative approaches to this very critical problem. Finally we will discuss some of our work at QCRI in this area.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web and Information Technologies

s of the Invited Talks Towards Automated Information Factories . . . . . . . . . . . . . . . . . . . . . . . . . . . .2Aris M. Ouksel Data Quality Not Your Typical Database Problem . . . . . . . . . . . . . . . . . . . . . .3

متن کامل

Data Quality – The Fuel that Drives the Business Engine

In today’s information age companies will live and die by information. This information is the fuel that drives the business engine. As more and more data is collected, the reality of a multichannel world that includes e-commerce, direct sales, call centers and existing systems sets in. Bad data is affecting companies at an alarming rate and the dilemma is clear: how can a company ensure that i...

متن کامل

The Main Steps to Data Quality

To gain knowledge out of your data, your data has to be of high quality. Bad data quality becomes more and more the problem for companies, who start to exploit their data stocks. This article will show the main obstacles on the way to perfect data quality. It is based on our experience to improve data quality in large customer or business partner databases. The examples mentioned in this paper ...

متن کامل

The sequelae of misinterpretating surgical outcome data.

On a normal working day, a plot is presented to You that will change your life forever. The plot (Fig. 1A) represents your variable life-adjusted display (VLAD) curves depicted versus VLAD curves from your ‘competing’ colleagues. This VLAD curve, a real case, a real plot (the surgeon is not an author of this letter), presents a plot of your cumulative sum of the difference in expected and obser...

متن کامل

Completeness in the Relational Model: a Comprehensive Framework

Completeness is a well known data quality dimension in the area of databases. Intuitively, a database is complete if it represents every fact of the real world coherent with the database semantics, i.e. its intension. In the paper, we provide a comprehensive framework for characterizing completeness in the relational model, investigating several different paradigms typical of database models, s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012